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Abstract. We introduce a new wavelet transform suitable for analyzing func- 
tions on point clouds and graphs. Our construction is based on a generalization 
of the average interpolating refinement scheme of Donoho. The most impor- 
tant ingredient of the original scheme that needs to be altered is the choice of 
the interpolant. Here, we define the interpolant as the minimizer of a smooth- 
ness functional, namely a generalization of the Laplacian energy, subject to 
the averaging constraints. In the continuous setting, we derive a formula for 
the optimal solution in terms of the poly-harmonic Green's function. The 
form of this solution is used to motivate our construction in the setting of 
graphs and point clouds. We highlight the empirical convergence of our refine- 
ment scheme and the potential applications of the resulting wavelet transform 
through experiments on a number of data stets. 



1. Introduction 

In many applications, the data naturally comes in the form of functions defined 
on non-standard domains such as point clouds and graphs. Of great practical 
interest are tools suitable for the processing of such functions, including denoising, 
compression, and learning. These tasks have been extensively studied in the setting 
of standard Euclidean domains, where multiscale methods have proven to be quite 
effective. One of the reasons for such effectiveness is, perhaps, the sheer quantity 
of various multiscale constructions available on standard domains. In contrast, on 
non-standard domains, there is a very limited choice of multiscale constructions; 
these include the Haar-like wavelets (T7J |8] , diffusion wavelets [5] , lifting wavelets 
[TO] , and spectral wavelets [pj . 

The main goal of this paper is to increase the assortment of available choices 
by generalizing the average interpolating wavelet scheme of Donoho [7] to point 
clouds and graphs. Our interest in this particular scheme is due to the following 
two reasons. First, as explained in |14j . when analyzing a function consisting purely 
of white noise, the resulting average interpolating wavelet coefficients are basically 
noise, but have roughly the same size at all scales and locations, and are roughly 
independent. These properties make the average interpolating wavelets especially 
useful for denoising. Second, the approach to regression via Haar wavelets proposed 
in [8], can be easily generalized to the average interpolating wavelets, making them 
suitable for machine learning problems. 

The average interpolating scheme is based on predicting function averages on 
smaller children regions based on the given averages on the larger parent region 
and its neighbors. This is achieved by fitting an interpolant to the provided aver- 
ages, and inferring the averages on the children regions from this interpolant. The 
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construction of the interpolant in the setting of the real line is straightforward - 
polynomials provide a natural choice. For example, if one bases the interpolation 
on the immediate neighbors of the parent region (dyadic interval) on the real line, 
there are three averages (two for neighbors, and one for the interval itself) to which 
one can fit a unique quadratic polynomial. If one wants to run the scheme with 
second order neighborhoods (neighbors and neighbors of neighbors), then there are 
five averages to which a fourth degree polynomial can be fit; similarly for higher 
order neighborhoods. Note that a polynomial of the same fixed degree is used 
throughout the construction. 

These conveniences disappear in the case of manifolds, point clouds, and graphs. 
First, there is no straightforward way of defining polynomials on manifolds, which 
means that we need a new interpolation basis. Second, the number of neighborhoods 
at every region can be different, which at first glance means that an interpolation 
basis of different dimension will be needed at each region. However, if we were to 
use such variable bases, the issue of stability would arise. Namely, when trying to 
predict the averages for the children of two neighboring regions, we may end up 
fitting very different interpolants and obtaining very disparate predicted averages 
at the children. This would make obtaining continuous wavelets impossible. 

Thus, the main challenge in obtaining a useful wavelet construction on general 
domains is to design a stable interpolation scheme that can handle varying numbers 
of neighbor regions. An additional requirement is that if all of the averages at the 
parent region and its neighbors are the same, then the interpolant must be constant. 
This is needed in order to ensure that the wavelets have a vanishing moment. To 
overcome this challenge, we define the interpolant as the minimizer of a smoothness 
functional subject to the averaging constraints. In the continuous setting, we derive 
a formula for the optimal solution in terms of the poly-harmonic Green's function. 
The form of this solution is then used to motivate our construction in the setting 
of graphs and point clouds. The resulting discrete interpolant has an extremely 
simple form: it is a linear combination of low-frequency (i.e. corresponding to 
smaller eigenvalues) eigenfunctions of the graph Laplacian. Since there can be 
many linear combinations that satisfy the averaging constraints, we single out a 
linear combination among them by picking the one that minimizes a measure of 
smoothness, such as the spline/Laplacian energy. 

This paper is organized as follows. Section [2] introduces an outline of our con- 
struction in the setting of compact Riemannian manifolds without boundary, the 
main goal being to provide an easy to follow presentation of the main ideas, and 
to motivate some of the choices made later in the discrete setting. Our presenta- 
tion in this section is not meant to be rigorous (e.g. we do not know/prove that 
the refinement scheme converges) , as we are truly interested in the ramifications in 
the discrete case. Section [3] provides the details and algorithms for the setting of 
weighted graphs and point clouds. Section [4] highlights the empirical convergence 
of our refinement scheme and the potential applications of the resulting wavelet 
transform through experiments on a number of data stets. 
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2. Motivation: Closed Manifolds 

2.1. Average Interpolating Wavelets. In this subsection we review the main 
steps of the average interpolating wavelet construction of Donoho [7] with modi- 
fications as needed in the setting of manifolds; our exposition closely follows the 
outline of [14] . 

For a closed Ricmannian manifold M , we first construct a hierarchical dyadic 
partition of M into contiguous regions {i?;^., I = 0, 1, and/c = 1, .... 2 1 }. We start 
by setting i?o.i = M, and then partition i?o,i using an appropriate procedure into 
two regions Ri^i and Ri^2- This process is repeated on each of the obtained regions 
in recursion, by partitioning each region Ri^ into two children regions i?;+i.2fc-i 
and i?;+i.2/c- As a result, we obtain infinitely many regions organized into a binary 
tree. We assume that the sizes (volumes, diameters) of regions at the same level I 
are comparable and converge to zero with an increasing /. For a given region Ri^ at 
level I, we use Vol(Ri t k) to denote its volume. At each level I, we can compute the 
average volume of all of the regions at that level; we denote this average by VolAvi . 
For region i?^ at level we can consider its neighboring regions, i.e. regions at the 
same level I that have a common boundary with Riy, we denote these regions by 
Rl,(k,r) with r = 1, m, where m is the number of neighbors, including the region 
itself; i.e. for convenience, we set (fc, 1) = fc. 

Given an integrable real-valued function / : M — > K, consider a pyramid of 
averages 

Pi,k = Ave{f\R hk } 

of function / over the regions Ri,k- Since we have Ri.k = Ri+i,2k-i^->Ri+i,2k, it 
follows that the pyramid is redundant with the following two scale relation 

[ ' ' M ~ Vol(Ri +h2k -i) + Vol{R l+h2k ) 

In the average interpolating wavelet construction, one tries to predict the aver- 
ages of / at the level I + 1 from the averages at the level I, and the discrepancies 
between predicted and actual averages are recorded as the wavelet coefficients. 

To be more precise, assume that we are given the averages 8i t k where k = l, 2 l 
and we would like to produce the predicted averages at the next level, /Sz+i^, 
k = 1, 2 l+1 . For a fixed region Ri.k, consider all of its neighboring regions R^(k,r)- 
Fit a more or less regular function ni t k : M — » R to the average values provided 
over these regions Ri^k, r ) \ namely, construct a function 717 ^ which satisfies 

(2-2) Ave{wi,k\Ri,(k, r )} = Pl,(k,r), r = 1, ...,m, 

where m is the number of neighbors of our region (remember, the region itself is 
also included in this count, and RiJk,i) = Rl,k)- Now, predict (impute) the averages 
at the two children regions of our region Ri± by setting 

A+i,2fc-i = Ave{Tri k \R l+1: 2k-i} h+i,2k = Ave{n l:k \R l+1 . 2 k}- 

Note that while the function 7T/,fc is global, but it is only fit to the local information 
at and around the given region, and used to predict the averages only at the children 
of the given region; a different function 717^ is needed for each region Ri t k- By 
repeating this procedure for all of the regions at level I, we will obtain predicted 
averages for all of the regions at level I + 1. 
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This two scale refinement scheme allows constructing the average interpolat- 
ing wavelet transform. Consider the differences between the actual and predicted 
averages at level I + 1 and define the wavelet coefficients 

Vol(Ri + i 2k) fo n , , . „ , „; 

oti,k = —i= 1A x (A+i,2fc - A+i,2fc), I > 0, k = l,...,2 l . 
^VolAvi +1 

The normalization in this formula achieves two goals. First, note that there is no 
need to keep the differences for the other child, a', t = Vol ^ Rl + 1 < 2 ±z}> x (R, + 1 2 a--i — 



A+i,2fe-i)> due to the redundancy in our pyramid of averages. Indeed, Eq. (2.1) 
implies that 

(A+l,2fc-l - A+l,2fc-l)FoZ(-R; + l,2fc-l) + (A + l,2fc - A+l,2fc)VoZ(i?i + i )2 fc) = 0, 

or more clearly, due to our scaling, 

(2.3) a hk +a! hk = Q. 

Second, the normalization coefficient has the right order of magnitude in order 
to make these wavelet coefficients similar to the usual wavelet coefficients in that 
the magnitude of "white noise" wavelet coefficients stabilize to have comparable 
magnitudes across levels. 

It can be easily seen that the original function / can be reconstructed given 
/?o,i and all of ai t k- Indeed, this backward wavelet transform process proceeds as 
follows. We first use /?o,i to compute the predicted averages and B\$, using the 
two scale refinement scheme, and then set f5\^ = + £*o,l x v<rf (fli 2) ' @ 1A 
can be uniquely computed using a similar formula and the redundancy relationship 



of Eq. (2.3) between the two children of each region. This gives us the averages at 
level 1 = 1. Repeating this process we can obtain the averages at level 1 = 2, then 
at level I = 3, etc. The function / is given as the limit of these averages. 

Now, note that in order to instantiate the above construction on manifolds, we 
need two ingredients: a method for obtaining the dyadic partition {i?/^}, and a 
method for constructing the interpolants 717^. The next two subsections address 
these issues. 

2.2. Partitioning. In the classical setting of the real line [7j, generating the dyadic 
subdivision is trivial. While this is not as easy in the setting of manifolds, we believe 
that an appropriate procedure can be devised. Consider the general problem of 
partitioning a given region fi C M into two connected regions. One can pick two 
points pi and P2 in Q and define the first region Ox as the set of points closer to 
pi than to pi as measured by the shortest distance within Vl (i.e. the length of 
the shortest path that remains within f2) ; the second region is defined similarly. 
This initial subdivision can be improved by optimizing the choice of points p\ and 
P2 in order to maximize some measure of quality, such as the relative sizes of the 
generated regions. 

Note that since the regions in the partitions get smaller and smaller with the 
increasing level I, after some level it will be possible to identify (e.g. via the expo- 
nential map) a region with a subset of the tangent space of a point within the region. 
Thereafter, it will be possible to apply some of the existing Euclidean approaches, 
such as 2-means (i.e. fc-means clustering algorithm with k = 2), to partition this 
subset of the tangent space and transfer the partition back to the manifold. We 
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anticipate that some of the results in [T6i , such as the existence of upper limits on 
the eccentricities of obtained regions can be transferred into the manifold setting. 

Since the main goal of this paper is to obtain a wavelet construction on discrete 
data, we skip a deeper discussion of this issue in the continuous case, and make an 
assumption that some "good" subdivision of the manifold is given. In practice, as 
discussed in the next section, we use a spectral embedding (somewhat similar to 
the diffusion map) of the manifold into a high-dimensional Euclidean space, and 
recursively subdivide it using the 2-means algorithm. 



2.3. Interpolation. Our interpolant construction is based on the average interpo- 
lating variational splines that appear in Pesenson [T3] . However, the splines of [T3] 
cannot be used directly here because they do not produce a constant interpolant 
when all of the prescribed averages are the same. We modify his formulation in 
order to attain this requirement leading to a vanishing moment, and we show how 
more vanishing moments can be obtained. We also derive formulas for the in- 
terpolant that allow for generalizations and efficient computation in the discrete 
setting. 

Consider the general problem: given disjoint (except for possibly common bound- 
aries) regions {fi r }>!_i on a closed Riemannian manifold M, construct a function 
7r : M — > K which attains the prescribed averages, 

(2.4) Ave{ir\n r } = (3 r , r = l,...,m. 

Let A be the Laplace-Beltrami operator on M; define the interpolant ir as the 
minimizer (in a suitable space of functions) of the following energy 

(A^ 2 tt) 2 

M 



subject to the constraints of Eq. ( 2.4 ); here p is an appropriate positive even integer. 
The objective function is strictly convex over the feasible set, and, so, the problem 
has a unique solution. Note that if all of fa values are the same, then the constant 
function ir — const would be both feasible and make the objective equal to zero, 
which means that the constant function would be the minimizer, as required. 

The choice of the objective function is natural due to the following reasons. 
First, on the real line, by setting p = 2 we obtain the functional J (n") 2 which is 
minimized by cubic splines. Second, this is a generalization of the Laplacian energy 
f M (A7r) 2 which is well-known to provide a measure of smoothness for functions on 
manifolds, see e.g. [3J. As a result, one can say that in some sense our optimization 
problem is finding the smoothest function on the manifold satisfying the averaging 
constraints. 

Next, we show that the solution of this minimization problem can be written in 
terms of the poly-harmonic Green's function, which leads to an efficient solution in 
practice. Our discussion is rather informal, as we will only use the resulting formula 
to guide our construction in the discrete case. 

The following preliminaries will be needed. Let {A„,0„}^L o be the Laplace- 
Beltrami eigenvalues and eigenfunctions; we assume that an orthonormal set of 
eigenfunctions was chosen. Note that for Ao = the corresponding eigenfunction 
(f)Q = const = 1/ \/Vol\M) due to the unit L 2 (M) normalization. Since all eigen- 
functions <f) n for n > are orthogonal to cf) , we obtain that J M <fi n — 0. An L 2 (M) 
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function / : M — > M can be expanded as 

/ = ao<Ao + ai4>i + a 2<j>2 + a 3 (f)3 + ... 

where a n = j M f<j> n - The generalized delta function can be defined by the com- 
pleteness relation S(x,y) = Y^^Lq 4>n{x)(f>n(y) ■ Note that, as usual, it satisfies 

f. OO r. OO 

/ 5(x,y)f(x)dx = ^4> n {y) / <t> n {x)f{x)dx = ^24> n {y)a n = f(y). 

jM n=0 JM n=0 

Consider the p-harmonic kernel 

<j>n(x)(pn(y) 



G(x,y) = J2 



n=l 



An 



We will assume that p is chosen such that the series iMn converges. If A x 

is the Laplacc-Bcltrami operator acting on the variable x, it is easy to verify that 

OO OO 

(2.5) A p x G(x,y) = £ A p 4> n {x)<t> n {y) I K = £ K4>n{x)Mv) I K 

n—1 71—1 
oo ^ 

= J2 M*)Mv) = 5 ( x > v) - vd{M) ■ 

The subtracted term is (f>o(x)<fio(y) = \ j \]V ol{M) x 1/ y/Vol(M) and is needed as 
the summation in the delta function starts at n = 0. Finally, it is easy to see that 

J M G ( x >y) dx = °- 

Now we will derive the optimality conditions for our optimization problem. The 
prescribed average requirements can be rewritten as 

[ n = p r Vol(£l r ), 
Jn r 

which then are incorporated into the problem to get the functional 

£{*)=( (A p/2 i) 2 Tcv(f n-/3 r Vol(n r )), 
Jm r=1 Jn r 

where c r are the Lagrange multipliers (in fact, — c r are the Lagrange multipliers). 
The first variation of the functional £ after dropping the second order terms in Sir 
is 

r. m r. 

6£ = £(n + 6n)-£(n) = 2A p ^ 2 ttA p / 2 (Stt) - V c r / Sir. 

Jm r=1 Jn r 

Now, notice that for any function /, it is true that f Q f(x)dx — f M f{x)J n S(x,y)dydx, 
and so we get 

6£ = [ 2SirA p ir- [ <$7rVc r f 5(x,y)dydx 
Jm Jm r=1 Jn r 

= f Sn[2A p ir- V c r f S(x,y)dy) 
Jm \ r=1 Jn r ) 
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Here we also used integration by parts for the first term and the fact that the 
manifold has no boundary. Setting this equal to zero and remembering that Sir is 
any small variation, we get that 



2A p tt - V c r / S(x, y)dy = 



is satisfied by the optimal function tt. 

To solve this, we make the substitution ir(x) — p(x) + \ 53r=l Cr In G(x, y)dy, 
where G{x,y) is the p-harmonic kernel. Notice that, 



AP f G(x,y)dy = f A*G(x, y)dy = f 
Jn r Jn r Jn 



where we used Eq. (2.5). As a result, we find that p satisfies 

2A>p-Vc Vt *M =Q 
9 h r Vol(M)-°- 

Since we assumed that M is without a boundary, this equality can only be satisfied 
if the constant term in the equation is zero, i.e. 



(2.6) 



J2c r Voi(n r ) = 0, 



r=l 



and then the solution is p = const = Co. 

Based on this, we can write the solution of the optimization problem as 



(2.7) 



m „ 

7T = c + y2c r G(x,y)dy, 
„_i Jar- 



where c r , r = 0, 1, m are some coefficients and G(x, y) is the p-harmonic kernel. 

There are m + 1 coefficients in the formula for 7r, and to determine them uniquely 
we will need as many equations. These equations are provided by the m averaging 



constraints of Eq. (2.4) and the equality of Eq. (2.6). Note that all of these 



equations are linear. Indeed, let A be the symmetric m x m matrix with entries 

A rr > = / G(x, y)dydx, 
Jn r Jn r , 

and let a be an m x 1 vector with entries a r = Vol(Q r ), r = 1, m. We also define 
the m x 1 vector b with entries b r = /3 r Vol(£l r ), r = l,...,m. The linear system 
determining the coefficients in the Eq. (|2.7|) is given by 



., m. 



' 


a 1 ' 




CO 




' " 


a 


A 




c 




b 



where c is the mx 1 vector with entries c r , r = 1, 



Higher vanishing moments. More vanishing moments can be obtained within 
this framework as well. Following [5 , vanishing moments in the manifold setting are 
defined as the wavelets being orthogonal to a Laplace-Bcltrami eigenfunction. For 
example, assume that we would like to obtain an average interpolating construction 
where the wavelets arc orthogonal to the eigenfunction (pi corresponding to the 
eigenvalue Aj. This means that if the interpolation problem is given averages that 
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arise from the function <f>i, i.e. that if the j3 r in Eq. (2.4) are given by f3 r — 
Ave{4>i\Q r }, then the interpolant must be given by 7r = <f>i. 

This can be achieved by defining the interpolant as the minimizer of the following 
functional: 

f (A p / 4 (A - Ai) p/4 tt) 2 . 

J M 

Now, functions n G span{(f> , ipi} make this functional equal to zero, and so, when 
the imposed averages come from a function in span{0 o , <fii}, then that function 
will be uniquely reconstructed. This holds assuming that the number of regions 
in the problem satisfies to > 2, and that there is a gap between Ai and the next 
eigenvalue A2. If the latter is not true, namely if Ai is not a simple eigenvalue, then 
the number of the additional vanishing moments will be equal to the multiplicity 
of Ai. Accordingly, the number of regions in the problem should not be less than 
the total number of vanishing moments. Q 

As before, an explicit form for the solution of this optimization problem can be 
obtained. Without going into the details, we just mention that the solution has the 
form 

m „ 

7T = c_i<^i + c + y~] c r / G(x, y)dy, 
where the kernel G(x, y) is defined by 



G(x,y)-^ 



<j> n (x)(p n (y) 



n=2 



A£ /2 (A„-A!)p/2' 

There are to + 2 coefficients in the formula for 7r, and to determine them uniquely 
we will need as many equations. As before, to + 1 of these equations are provided 



by the m averaging constraints of Eq. (2.4) and the equality of Eq. (2.6). The 



additional equation that must be satisfied is 



/C r (f>i = 0. 
r=l J ^ r 

A straightforward generalization of this technique can be used to obtain further 
vanishing moments. 

3. Discrete Setting 

3.1. Discrete Setup. In order to formulate our construction in the most general 
setting, we consider a simple connected weighted graph G with vertex set V of size 
N: in the following, we identify V with {1, 2, ...,N}. We assume that non-negative 
weights are associated with edges, and, perhaps, with vertices. For a real valued 
function on the graph vertices, / : V — > K., we will denote by f(i) its value on the 
i-th vertex. We sometimes will think of such functions as vectors in M. N . 

Using the edge weights, we can construct the symmetric N x N matrix W which 
has entry Wij equal to the weight associated with the edge connecting the 

vertices i and j; we set Wij = when i and j are not connected with an edge (since 
there are no self-loops, we always have Wu = 0). We also construct the diagonal 



4n the average interpolation construction the required number of neighboring regions can be 
achieved by considering higher order neighborhoods. In addition, in the initial level of the dyadic 
partitioning of the manifold (I = 0) one needs to start with at least as many partitions as the 
number of the vanishing moments. 
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N x N matrix S which captures the weights associated with the vertices: we set 
Sa equal to the weight of the i-th vertex in the graph. If the graph has no vertex 
weights, we can simply set Sa — 1 for all i — 1, N; note that there are other 
options, one of which is setting Sa = Wij, the sum of all edge weights incident 
to the z-th vertex. 

The role that the Laplace-Beltrami operator plays in the continuous case will 
be played in the discrete setting by the graph Laplacian. Let D be the diagonal 
matrix with entries Da — . Wij . The graph Laplacian L is defined as 

L = S- 1 {D-W). 

While our problem setup started with a weighted graph and arrived to the Lapla- 
cian matrix L, our construction can also be applied when one starts with the Lapla- 
cian matrix L and infers from it the weighted graph. This is a natural way of dealing 
with point clouds sampled from a low-dimensional manifold, a setting common in 
manifold learning. There are numerous ways for computing Laplacians on point 
clouds, see [IJ 0J [31 1|}] ; most of them fit into the above form L = S^ 1 (D— W), and 
so, they can be used to infer a weighted graph. Some of the point cloud Lapla- 
cians converge to the Laplace-Beltrami operator of the underlying manifold as the 
density of point sampling is increased, a fact that makes the parallels between our 
discrete and and continuous constructions more apparent. 

3.2. Preliminaries. Given the Laplacian L = S^ 1 (D — W), we will need to com- 
pute its eigenvalues and eigenvectors. Note that L may not be symmetric due to 
the factor of To avoid solving a non-symmetric eigenvalue problem, we solve 
the following symmetric generalized eigenvalue problem 

(D - W)<j) = \S<j). 

Note that the eigenvalues are non-negative, with = Ao < Ai < A2 < A3 < ... < 
Ajv-i- As in the continuous case, we can pick an orthonormal set of eigenvectors 
4> n , n = 0, 1, ...,N — 1, with </> = const. Importantly, these eigenvectors will be 
orthonormal with respect to the S-inner product. Namely, we will have 

(3.1) 4>nS(j)n' = <W, 

where <L„/ is Kronecker's delta. 



An interesting ramification of Eq. (3.1 1 relates to how one converts integrals ap- 



pearing in the continuous setting to the current discrete setting. Indeed, comparing 
Eq. (3.1) to the continuous orthonormality relation, J M 4> n 4> n ' — 5 n n', it becomes 



apparent that integrals should become weighted sums, 

(3.2) f /->y;5«/(i). 

JM \ 

Accordingly, for a function / : V — > R, we define its average value over a subset of 
vertices R C V as the weighted average: 



Ave{f\R} = 



Based on this formula, it is natural to define the volume of R by Vol(R) — J2i^R 
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3.3. Embedding. Our construction will interact with the graph structure through 
its embedding into a high dimensional Euclidean space. Consider the map % : V — > 
R 1 ^^ 1 defined by the formula 



(3.3) H(i) = 



<j>i(i) toji) 4>N-i{i) 

v A l A 2 A JV-1 , 



to which we will refer as the p-harmonic embedding of the graph. Note that this 
is similar to the Diffusion Map [4j with the difference being in how the individual 
eigenvectors are scaled, by a factor of A„ p ^ 2 here, versus e~ tA ™ for the Diffusion 
Map. When p = 2, we obtain the biharmonic embedding introduced in the case of 
the surface meshes in 

The relevance of the p-harmonic embedding becomes clear due to the following 
connection. We will need a discrete equivalent of the p-harmonic Green's kernel, 
which one naturally defines similarly to the continuous case as: 

N-l 



An 

It is easy to see that the following holds, 

G(i,j)=H(i) -H(j), 

where the dot signifies the usual dot product of vectors. 

In practice two issues need to be addressed. First, for efficiency reasons, it is 
impractical to compute the complete eigen-decomposition of the Laplacian, and so 
we truncate the embedding to only include the eigenvectors corresponding to the 
smallest few-hundred eigenvalues. Second, a choice of p should be made. While 
one can use this as a tuning parameter, we recommend setting p in such a way 



that the denominator in the embedding of Eq. (3.3 1 grows at least linearly in n. 
In the point cloud case where the discrete Laplacian approximates the Laplace- 
Beltrami operator of a smooth manifold M, we know by the Weyl estimate that 
A n ~ n 2/Am(M) T/hus, to make the denominator of at least linear order one needs 
to set p > dim(M). Various techniques for estimating the dimensionality of a point 
cloud are available, for our purposes a crude estimate (perhaps, based on fitting 
a line to the log-log scatter-plot of (n, A„) on the plane) will suffice. Setting the 
value of p too high, will effectively discard the higher eigenvectors, and will result 
in numerical stability issues at the very fine levels, as there will not be enough 
variation in the kernel to provide enough degrees of freedom for interpolation. 
Finally, we note that a straightforward generalization of our construction comes 



by using other embeddings instead of the embedding of Eq. (3.3 1. Our specific 
choice was driven by the formula for the interpolant in the continuous case which 
requires the p-harmonic Green's kernel. This kernel, in turn, arises due to our 
choice of the functional to minimize. If a different embedding is used, implicitly the 
minimized functional will assume another form. However, in practice, the explicit 
form of the functional is not needed in order to complete the construction, the only 
entity that our construction interacts with is the embedding one chooses to use. 

3.4. Partitioning. As in the continuous case, in order to realize the average- 
interpolating wavelet scheme we first need to construct a dyadic partitioning of 
the graph G. While a variety of graph partitioning techniques exist, we base our 
partitioning on the spectral clustering algorithm of [H] as applied to the embedding 
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of the Eq. (3.3). To obtain a binary tree of partitions, we start with the graph 
itself as the root. At every step, a given region (a subset of the vertex set) of graph 
G is split into two children partitions by running the 2-means clustering algorithm 
(fc-means with k — 2) on the embedding of Eq. (3.3) restricted to the vertices of 
the given partition. This process is continued in recursion at every obtained region. 

In practice two issues need to be addressed. First, due to the finite size of the 
graph, we cannot continue partitioning the graph ad infinitum. The maximum 
number of levels, l ma x, should be chosen based on the number of graph vertices. 
We usually set the number of levels as l max — \\0g2 N\ ; smaller values can be set if 
efficiency is a consideration or if a less detailed analysis of data is needed. Second, 
we did not let the non-leaf regions (i.e. regions at levels I < l ma x) to become 
less than a certain size (2 vertices in our experiments). As a result, we would get 
nodes in the binary tree which have a single child (i.e. a region would be small 
enough so that it is not split anymore). On the other hand, the regions at level 
lmax would each contain a single node of the graph. Thus, in reality, our tree is not 
strictly a binary tree at the level l m ax- The redundancy in the average-interpolating 
pyramid for a binary partition tree allows one to keep one wavelet coefficient at each 
parent node. To accommodate non-binary trees in our implementation, we keep the 
wavelet coefficients at the children nodes (namely, we keep oti.k at node Ri+i_2k, 
then a[ k at i?z+i,2fc— 1 , and so on if there are more children) which makes our 
representation redundant. When modifying the wavelet coefficients (e.g. in order 
to achieve denoising), one has to make sure that the straightforward generalization 
to non-binary trees of the redundancy relationship of Eq. (2.3) holds. 



3.5. Interpolation. The second ingredient in the average-interpolating wavelet 
scheme is interpolation. To achieve interpolation on graphs, we simply modify the 
formulas provided in the continuous case. First, the p-harmonic Green's kernel 
G{i, j) is computed using the truncated p-harmonic embedding described above. 



Now, the interpolant is given by the formula of Eq. (2.7) where the coefficients 
are computed using the same linear system of equations as in the continuous case. 
When setting up this system of equations, we replace all of the integrals by weighted 



sums according to the rule in Eq. (3.2) 



The use of the truncated embedding can be interpreted as follows. Suppose 
that in the embedding we are using only the eigenvectors up to index n max . One 
can prove that the obtained interpolant gives the minimizer of the functional 
r M (A p / 2 7r) 2 among tt £span{<j)o, <j>\, </>n mox }, subject to the averaging constraints. 
This means that our interpolant 7r is a linear combination 

7T = a + CLl4>l + a 2 <t>2 + ••• + CL nmax <j) nma!c , 

and among all such linear combinations that satisfy the averaging constraints, we 
choose the one that minimizes 

f Umax 

(3.4) / (A^tt) 2 = £ a n Xl. 

While it seems that one could directly solve this optimization problem without 



any recourse to the solution formula of Eq. (2.7), in practice we found that the 



solutions obtained using such direct methods may become numerically unstable 
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and lead to discontinuities in the scaling functions On the other hand, this 
formulation of the problem bears more resemblance to the classical approach of 
Donoho [7] . Indeed, in the setting of the real line, the solution is sought as a linear 
combination of the monomials, and the averaging constraints are satisfied by a 
unique linear combination of the monomials. In our setting, monomials are replaced 
by Laplace-Beltrami eigenfunctions, and due to the variable number of neighbors 
there will be many linear combinations that satisfy the averaging constraints. To 
single out a linear combination among them we pick one that minimizes a measure 



of smoothness as given by the expression in Eq. (3.4 1. 

One issue that arises during interpolation when dealing with point clouds is the 
question of which regions should be considered neighbors. The simplest approach 
would be to declare two regions neighbors if there is an edge connecting a vertex 
from one region to a vertex from the other. Note that the graph structure for point 
clouds is inferred from the point cloud's Laplacian. These Laplacians may have 
rather large stencils: for example, in one approach, the stencil at a given point 
contains a fixed number, sometimes 50-100, of its nearest neighbor points. If this 
is the case, then even a region containing two points will have a large number of 
neighboring regions, which can be counter-intuitive to the notion that the point 
cloud is a representation of an underlying low-dimensional manifold; this will also 
prevent the wavelets from having supports smaller than a certain size. To circum- 
vent this issue, we look at the total weight sum for all edges running between the 
two regions, and if this "cut size" is large enough, then we declare the two regions 
to be neighbors. For each region, we set the minimum threshold on the cut size 
equal to a fraction of the largest cut size at the given region. 

4. Numerical Examples 

In order to investigate the empirical properties of the average interpolating con- 
struction introduced in this paper, we ran a set of experiments using a number of 
synthetic and real data sets. For the experiments, we compute and visualize the 
scaling functions at different levels I. There is one scaling function for each region 
Rl k, which can be obtained by setting the function average at that region R[ j. 
equal to one, and then applying the refinement process. To get scaling functions 
supported at similar locations, we pick a point on the graph, and at each level show 
the scaling function corresponding to the region that contains the picked point. 
For the ease of visualization, all functions are scaled to have the maximum abso- 
lute value of 1 and the following color coding is used: dark red represents larger 
positive values, neutral green corresponds to zero, and dark blue represents larger 
(in absolute value) negative values. Wavelets are not shown because the wavelet at 
a given parent region is simply the weighted difference between the scaling functions 
corresponding to the two child regions. In all of the experiments we used the value 
of p = 2. In all of the examples, except the one-dimensional case, we only look at 
the immediate neighbors of a region, out of which we retain only the ones that have 
large enough cut size (at least 1/8 of the largest cut size at the given region). All 
presented example are based on a single vanishing moment. 

2 For example, we found the following direct approach to lead to discontinuities. First, eliminate 
ao using one of the averaging constraints. Next, rescale the unknowns so that the expression that 
is minimized is simply the sum of squares of the unknowns (b n = a„A^ 2 ), and then use the 
pseudo-inverse to solve the remaining averaging constraint equations. 
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Figure [T] shows the scaling functions in one-dimensional case. The discrete Lapla- 
cian L is obtained using the three point stencil [— 1, 2, — 1]. Our subdivision process 
reproduces the dyadic subdivision of the interval up to a small numerical error. Dif- 
ferent scaling functions result based on how many neighbors of a given interval are 
used during interpolation. For example, if one uses the immediate neighbors of an 
interval only, the function shown in Figure [l] (a) results; but if both the immediate 
neighbors and their neighbors are used, then Figure [l] (b) results. In the original 
construction of Donoho, the first case would require a quadratic polynomial, and 
the second case - a quartic polynomial, and so on. In our construction, indepen- 
dently of the number of neighbors, the form of the interpolant stays the same. As 
a result, except for Figure [l] (a), our scaling functions are different from Donoho's, 
which can be seen by examining the empirical two scale relation coefficients of our 
construction. Note that as in the original construction, increasing the number of 
neighbors results in smoother but less localized scaling functions. 

Figure [2] (a) depicts a graph representing the road network for Minnesota, with 
edges showing the major roads and vertices being their intersections. Every edge 
has unit weight, while every vertex's weight is set equal to its degree; as a result, 
the Laplacian L equals to the normalized graph Laplacian. The obtained scaling 
functions at different levels are shown in Figures [2] (b-f). 

Figure [3] shows the scaling functions on the "Swiss Roll" , a 2-dimensional mani- 
fold with boundary. Here, it is represented as a uniformly sampled point cloud and 
the Laplacian is computed using the point cloud Laplacian of [4|. Since the ingre- 
dients used in our construction are intrinsic, the scaling functions are supported on 
contiguous regions along the manifold. 

Figure [4] depicts the scaling functions on a planar domain with irregular bound- 
ary. Once again, this is considered as a point cloud and the Laplacian of £Q is used. 
Note that the supports of scaling functions adapt to the shape of the domain by 
initially spreading within the "bulb" before starting to grow into the narrow region 
connecting the "bulbs". This behavior is similar to how spectral distances, such as 
the diffusion distance, behave. 

In Figure [5] we exemplify a straightforward application of our construction to 
smoothing a noisy function on a manifold. We added Gaussian noise to the smooth 
function on the sphere in Figure [5] (a) , to obtain the function in Figure [5] (b) with 
the SNR of 1.24 dB. We compute the wavelet transform of the function in (b) and 
then set all of the wavelet coefficients after certain level (I > 5) equal to zero. 
Inverting the transform, we obtain the function depicted in [5] (c) with the SNR of 
18.2 dB. Clearly, more sophisticated wavelet denoising approaches can be applied 
in general. 

Another simple application of our construction is to the problem of regression 
on manifolds. For a function given on the S-manifold in Figure [6] (a), we randomly 
sample 75 points shown in Figure [6] (b) . The function values at these sampled 
points are used to reconstruct the original function on the entire manifold using 
the following procedure, similar in the spirit to that of [8j. First, whenever a region 
contains at least one of the sampled points where the function value is provided, 
we can estimate the function's average at that region using these sample values (in 
the obvious way). These averages are put in the pyramid, and then we compute 
the wavelet coefficients a Uk ~ (A+i,2fc - Pi+i,ik) an d a' l k ~ (A+i,2fc-i - A+i,2fe-i) 
whenever all of the involved terms can be estimated. This requires that the children 
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regions Ri+i.2k and i?z+i,2fc— i, the parent region Ri,k, and the neighbors of the 
parent used for interpolation all contain at least a single sampled point with a 
known function value. All of the wavelet coefficients that cannot be estimated 
are set to zero. In the final step, we need to reconcile the values of a^k and 
a[ k so that the redundancy relationship ai^ + a', fc = is satisfied in the entire 
pyramid; this is achieved by setting ai t k — (&l,k — a ik)/^ anc ^ a i k ~ ( a ik~ 
a/jt)/2. The reconciliation step is need because the sampled points may distribute 
unevenly between the regions, and so the estimated averages in the pyramid may 
not satisfy the redundancy relationship of Eq. (2.1 ) that would have been satisfied 
for the actual averages. Next, running the backward wavelet transform, we obtain 
the reconstructed function. In the example provided here, this process yields the 
function shown in Figure [6] (c); the normalized RMSE of this reconstruction is 
2.83%. 



5. Summary and Future Work 

We introduced a multiscale construction based on a generalization of the average 
interpolating scheme of Donoho [7] . The construction yields wavelets suitable for 
analyzing functions on graphs and point clouds. Our main goal in this paper was to 
call the reader's attention to the possibility of such a construction by motivating it 
in the continuous case and adapting to the discrete case. As a result, this work pro- 
vides only a small, first step and has limitations that suggest topics for future work. 
A first topic suitable for further investigation is to more formally characterize the 
theoretical properties of the construction in the continuous case. For example, one 
observes experimentally that our refinement process converges to yield continuous 
scaling functions. Providing a rigorous proof of this observation would be interest- 
ing. A second topic for future research would be to develop theoretical bounds on 
the decay of the wavelet coefficients in the discrete case. In fact, it seems that some 
of the results from [5] should easily generalize to our setting. Finally, it would be 
interesting to investigate applications of the construction to areas such as machine 
learning, image processing, and computer graphics. 
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Figure 1 . Scaling functions on the real line at a single level (scale) 
for different choices of the neighborhood order. 
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(a) Minnesota road graph (b) A scaling function at level I = 4 




(c) A scaling function at level I = 5 
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(d) A scaling function at level I = 6 
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(e) A scaling function at level I = 7 (f) A scaling function at level £ = 8 

Figure 2. Scaling functions at different levels on the Minnesota 
road graph. 
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(a) A scaling function at level I = 2 




(e) A scaling function at level I = 6 
Figure 3. Scaling functions at 




(b) A scaling function at level I = 3 




(d) A scaling function at level I — 5 




(f) A scaling function at level I = 7 
different levels on the Swiss Roll. 
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(a) A scaling function at level I — 1 (b) A scaling function at level I = 2 






(c) A scaling function at level I = 3 (d) A scaling function at level I = 4 





(e) A scaling function at level I — 5 (f) A scaling function at level I = 6 



Figure 4. Scaling functions at different levels on an irregular pla- 
nar domain. 
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(a) A smooth function (b) Gaussian noise added (c) The result of smoothing 
Figure 5. Smoothing out white noise using wavelets on the sphere. 




(a) Original function (b) Sampled points (c) Reconstruction 

Figure 6. Reconstructing a function from its values at a subset 
of points on the S-manifold. 



